37 research outputs found

    From-Below Boolean Matrix Factorization Algorithm Based on MDL

    Full text link
    During the past few years Boolean matrix factorization (BMF) has become an important direction in data analysis. The minimum description length principle (MDL) was successfully adapted in BMF for the model order selection. Nevertheless, a BMF algorithm performing good results from the standpoint of standard measures in BMF is missing. In this paper, we propose a novel from-below Boolean matrix factorization algorithm based on formal concept analysis. The algorithm utilizes the MDL principle as a criterion for the factor selection. On various experiments we show that the proposed algorithm outperforms---from different standpoints---existing state-of-the-art BMF algorithms

    Numerical Pattern Mining Through Compression

    Get PDF
    International audiencePattern Mining (PM) has a prominent place in Data Science and finds its application in a wide range of domains. To avoid the exponential explosion of patterns different methods have been proposed. They are based on assumptions on interestingness and usually return very different pattern sets. In this paper we propose to use a compression-based objective as a well-justified and robust interestingness measure. We define the description lengths for datasets and use the Minimum Description Length principle (MDL) to find patterns that ensure the best compression. Our experiments show that the application of MDL to numerical data provides a small and characteristic subsets of patterns describing data in a compact way

    On Entropy in Pattern Mining

    Get PDF
    National audienceIn this paper we consider different entropy-based approaches to Pattern Mining. We discuss how entropy on pattern sets can be defined and how it can be incorporated into different stages of mining, from computing candidates to interesting patterns to assessing quality of pattern sets

    Numerical Pattern Mining Through Compression

    Get PDF
    International audiencePattern Mining (PM) has a prominent place in Data Science and finds its application in a wide range of domains. To avoid the exponential explosion of patterns different methods have been proposed. They are based on assumptions on interestingness and usually return very different pattern sets. In this paper we propose to use a compression-based objective as a well-justified and robust interestingness measure. We define the description lengths for datasets and use the Minimum Description Length principle (MDL) to find patterns that ensure the best compression. Our experiments show that the application of MDL to numerical data provides a small and characteristic subsets of patterns describing data in a compact way

    On Coupling FCA and MDL in Pattern Mining

    Get PDF
    International audiencePattern Mining is a well-studied field in Data Mining and Machine Learning. The modern methods are based on dynamically updating models, among which MDL-based ones ensure high-quality pattern sets. Formal concepts also characterize patterns in a condensed form. In this paper we study MDL-based algorithm called Krimp in FCA settings and propose a modified version that benefits from FCA and relies on probabilistic assumptions that underlie MDL. We provide an experimental proof that the proposed approach improves quality of pattern sets generated by Krimp

    MDL for FCA: is there a place for background knowledge?

    Get PDF
    International audienceThe Minimal Description Length (MDL) principle is a powerful and well founded approach, which has been successfully applied in a wide range of Data Mining tasks. In this paper we address the problem of pattern mining with MDL. We discuss how constraints-background knowledge on interestingness of patterns-can be embedded into MDL and argue the benefits of MDL over a simple selection of patterns based on measures

    What MDL can bring to Pattern Mining

    Get PDF
    International audienc

    Gradual Discovery with Closure Structure of a Concept Lattice

    Get PDF
    International audienceAn approximate discovery of closed itemsets is usually based on either setting a frequency threshold or computing a sequence of projections. Both approaches, being incremental, do not provide any estimate of the size of the next output and do not ensure that "more interesting patterns" will be generated first. We propose to generate closed item-sets incrementally, w.r.t. the size of the smallest (cardinality-minimal or minimum) generators and show that this approach (i) exhibits anytime property, and (ii) generates itemsets of decreasing quality
    corecore